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ABSTRACT 


Cloud computing which is created on Internet has the most powerful 
architecture of computation that provides users with the capabilities of 
information technology as a service and allows them to have access to these 
services without having specialized information or controlling the 
infrastructure. Fault tolerance has. The main advantages of using fault 
tolerance that has all the necessary techniques to keep active power and 
reliability in cloud computing include failure recovery, lower costs, and 
improved performance criteria. In this paper, we will investigation of the 
different techniques that are used for fault tolerance on cloud computing. 
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1. INTRODUCTION 

The growth of cloud computing over the past few years is hypothetically one of 
the major improvements in computing history. While a lot of research is 
currently taking place in the technology itself, there is an equally urgent need 
for understanding the business-related issues surrounding cloud computing, 
applications and services which are run on a distributed network with the help 
of available resources. Cloud computing when software and applications are run 
provides an abstract representation of physical systems [4], The main 
advantage of cloud computing is to provide reliability, low cost, high 
availability, scalability and flexibility for end users which appears as a new 
computing paradigm [2], 


How to cite this paper: Ya Min | Khin 
Myat Nwe Win | Aye Mya Sandar "An 
Investigation of Fault Tolerance 
Techniques in Cloud Computing" 

Published in 

International 
Journal of Trend in 
Scientific Research 
and Development 
[ijtsrd), ISSN: 2456- 
6470, Volume-3 | 

Issue-5, August 

2019, pp.1455-1457, 

https://doi.org/10.31142/ijtsrd26611 

Copyright © 2019 by authorfs) and 
International Journal of Trend in Scientific 
Research and Development Journal. This 
is an Open Access article distributed 
under the terms of 
the Creative 

Commons Attribution 
License (CC BY 4.0) 

(Tittp://creativecommons.org/li censes/by 
/ 4.0) 

Each ofperformance and reliability play an important role in 
cloud computing, because if the service reliability is low, it 
occasions frequent crashes in the cloud service, which in 
turn results in a reduction in the number of customers and in 
the result is a loss for the server. If the reliability of the 
service is not high, but its efficiency is low, users who ask for 
services should wait for a long time and this will also be a 
disappointment to them. So it can be directly related to 
tolerance to error [3] [2], 


and the availability and reliability will not be lost of the 
system. A tolerable bug system is a form that can in the event 
of a bug, tolerate it and continue to work. Perhaps it's better 
to have a definition of error at first, because with this word, 
two words are also mentioned in the mind that is the fault 
and the failure. But there are three differences between 
them. Failure: A failure occurs when an expected system is 
not functioning correctly so, if the system misconduct affords 
the system to fail at least one of its capabilities properly, 
then the system is in a malfunction. 
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Fault tolerance is one of the important issues in cloud 
computing and is related to all the necessary techniques to 
enable the system to tolerate the remaining software fault in 
the system after its development. Fault-tolerance techniques 
provide reliability and validity in the cloud environment. The 
main advantages of the implementation of the fault tolerance 
technique in cloud computing are: failure recovery, low cost, 
improved performance criteria, and so on [5], 

2. Fault tolerance 

Fault tolerance is a feature of the system that prevents a 
computer system or network device from failing due to any 
fault or failures in system execution. The fault tolerance 
includes effective steps to prevent such errors or failures in 
the system [6], A fault-tolerant system is capable of 
providing the service in question in an efficient manner if 
one or more faults or failures occur in system components 


Fault: The cause of the failure is a fault in the system. So the 
fault is a physical malfunction or a failure of a hardware or 
software component. 

Bug: The afford of an error is a bug in the system. [1], 
Failure, faults and bugs may occur in applications, virtual 
machines, and even hardware. The system must be capable 
of handling the fault and continue to operate. So there are 
two solutions to this problem: 

Fault detection: To provide each evaluation, the first step 
that a system must perform is identifying the fault functions. 

Fault Repair: After the system detects a fault, the next step 
is to avoid the fault or to improve it. 
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3. Metrics for Fault Tolerance in Cloud Computing 

According to the [7], The existing fault tolerance technique in 
cloud computing considers various parameters: throughput, 
response-time, scalability, performance, availability, 
usability, reliability, security and associated over-head. 

> Throughput: It defines the number of tasks whose 
execution has been completed. Throughput of a system 
should be high. 

> Response Time: Time taken by an algorithm to respond 
and its value should be made minimized. Scalability- 
Number of nodes in a system does not affect the fault 
tolerance capacity of the algorithm. 

> Performance: This parameter checks the effectiveness 
of the system. Performance of the system has to be 
enhanced at a sensible cost e.g. by allowing acceptable 
delays the response time can be reduced. 

> Availability: Availability of a system is directly 
proportional to its reliability. It is the possibility that an 
item is functioning at a given instance of time under 
defined circumstances. 

> Usability: The extent to which a product can be used by 
a user to achieve goals with effectiveness, efficiency, and 
satisfaction. 

> Reliability: This aspect aims to give correct or 
acceptable result within a time bounded environment. 

> Overhead Associated: It is the overhead associated 
while implementing an algorithm. Overheads can be 
imposed because of task movements, inter process or 
inter-processor communication. For the efficiency of 
fault tolerance technique, the overheads should be 
minimized. 

> Cost effectiveness: Here the cost is only defined as a 
monitorial cost. [8] 

4. Types of fault tolerance 

Fault tolerance can be classified in two categories of 
hardware fault tolerance and software fault tolerance. [1] 

1. Hardware fault tolerance 

One of the main goals of fault tolerance is make the 
computer system which can automatically recover if 
multiple random faults occur in hardware components. The 
developed methods for this work generally include the 
partitioning of a computational system in several modules. 
Each module in the system has been redundant Therefore, if 
the failure occurs in one of the modules, the backup module 
will continue to work. Fault tolerant methods include two 
types of error handling and dynamic recovery [9], 

Fault coating: fault coating is a structural redundancy 
technique that completely eliminates faults in a set of mixed 
components. A number of identical components execute 
similar functions and their output is voted on to remove 
faults afforded by a defective module. [1] 

Dynamic retrieval: A dynamic retrieval technique is only 
used when a copy of the work or calculations is made to run 
at a time. This technique administers self-repair. Like a fault¬ 
coating technique, additional spare components are used to 
perform backup operations.[1] 



Fig (1): Types of Fault Tolerance 


2. Software fault tolerance 

Software faults [programming faults] can be exploited using 
static and dynamic methods similar to those used for 
hardware fault handling. One of these methods is n-version 
programming, which uses static redundancy in the form of 
independent programs. All of them are doing the same thing. 
There is another method called design variation, which 
incorporated software and hardware fault tolerance by 
applying a fault-tolerant computer system using hardware 
and software in redundant channels [10] the main target of 
the design diversity technique is to tolerate hardware and 
software faults, but the cost is very expensive. [1] 

5. Downside of a Fault Tolerant System 

According to [11], Here is a short list and brief description of 
fault tolerant design disadvantages: 

Masking or obscuring low-level failures 

The nature of a fault tolerance design is to continue to 
operate normally even with a component failure. Thus if the 
ability to detect a component failure relies on a loss of 
function or capability, it maybe difficult to detect the failure. 
This sets the stage for a second component failure to cause a 
system downing event. Being able to detect individual 
component failures permits the repair or replacement of 
faulty elements restoring the system to full fault tolerance 
capability.[ll] 

Increase in testing challenges 

Similar to the inability to detect some single point failures, 
the ability to test the functionality and parametric values of 
components is also limited by the nature of the fault 
tolerance design. It may require additional test functionality 
designed into the system, further adding to the complexity of 
the system. [11] 

Increase in cost, weight, and complexity 

Redundancy, error checking, and fault isolation designs, as 
examples, add components and logical elements to a system. 
This increases the weight, due to the added components, 
board size, and power requirements. It also adds complexity 
by including parallel, and complex circuit and logic required 
to detect and ignore [functionally speaking] single point 
failures. Add parts and complexity, additional cost. [11] 

Reduction in emphasis on improving component or 
subsystem reliability 

The design team may not focus on improving the inherent 
reliability of elements of a fault-tolerant system. This tends 
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to occur as the priority is on identifying single point failures 
and creating a design that is resilient enough to continue 
operation. The focus in system availability and not 
necessarily on system reliability. [11] 

Increase in acceptance of inferior components 

Similar to the loss of focus on inherent reliability, the team 
may accept the lower cost and inferior component despite 
the increasing frequency of component failures. Again the 
focus on system availability and robustness even with 
component failures lose priority as the design demonstrates 
its ability to meet fault tolerant requirements. [11] 

Increase in support and maintenance expenses 

The lack of focus on reliability and the increased use of 
inferior components causes an increase in component level 
failures. These failures then require replacements and 
repairs of the affected systems. This increases the cost of 
operation of the system. 

Fault-tolerant design is for specific applications where the 
added cost, weight and complexity along with the other 
downsides to this approach are worth the expenses. A good 
team will focus on both the system availability along with the 
cost of operation/maintenance and the inherent reliability of 
the individual elements. [11] 

6. Conclusion 

Cloud computing has become a generally used computing 
technology and very popular in today. It must be reliability 
and availability for user and requires utilize of tested 
tolerance methods which can manage any kind of fault in 
every feature. Fault tolerance techniques are used to expect 
these failures and take the necessary actions before the 
damage happened. Reliability and availability are two 
important parameters in cloud computing. Therefore, we 
need a fault tolerance method that will prepare the services 
provided in cloud computing against the resulting faults and 
failures. There are a number of fault-tolerant techniques in 
the cloud, this paper tries to investigate a proper and 
efficient model that covers the most aspects of fault 
tolerance in cloud computing, In the future, it is also 
expected to better understand the types of faults in 
hardware, software, and cloud infrastructure by providing 
other models of architecture with higher fault tolerance, 
higher reliability, availability, and more impressive 
performance. 
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